Skip to content

Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline #131236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

masseyke
Copy link
Member

We are supposed to load a geoip database even if download_database_on_pipeline_creation is set to false, if it is referenced from a default or final processor. However, if the geoip processor is referenced from a pipeline that is referenced as a pipeline processor from the default or final pipeline, we do not correctly do this. The result is that the database is not downloaded, and all data is tagged with something like _geoip_database_unavailable_GeoLite2-City.mmdb rather than having geo data added to it.
Related: #96624

…peline processor within a default or final pipeline
@masseyke masseyke added >bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 v9.2.0 v9.0.5 v8.18.5 labels Jul 14, 2025
@masseyke masseyke requested a review from joegallo July 14, 2025 19:18
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jul 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @masseyke, I've created a changelog YAML for you.

@masseyke masseyke requested a review from Copilot July 14, 2025 19:20
Copilot

This comment was marked as outdated.

@masseyke masseyke requested a review from Copilot July 15, 2025 21:03
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where GeoIP databases were not being downloaded when download_database_on_pipeline_creation is set to false but the GeoIP processor is referenced through nested pipeline processors within default or final pipelines. The fix ensures proper recursive traversal of pipeline processors to detect GeoIP processors at any nesting level.

Key changes:

  • Enhanced GeoIP processor detection to recursively traverse pipeline processors
  • Added cycle detection to prevent stack overflow in pipelines with circular references
  • Comprehensive test coverage for nested pipeline scenarios

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
GeoIpDownloaderTaskExecutor.java Core logic enhancement to recursively detect GeoIP processors in nested pipeline processors with cycle detection
GeoIpDownloaderTaskExecutorTests.java New test cases covering nested pipeline scenarios and recursive pipeline detection
docs/changelog/131236.yaml Changelog entry documenting the bug fix

* this could lead to a geo database not being downloaded for the pipeline, but it doesn't really matter since the
* pipeline was going to fail anyway.
*/
logger.warn("Detected that pipeline [{}] is called recursively.", pipelineName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up being too verbose in practice, so I think the logging has to go.

Copy link
Contributor

@joegallo joegallo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the logger call and ship it. Nice job!

@joegallo joegallo merged commit 2381e5d into elastic:main Jul 21, 2025
33 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts
9.1
9.0 Commit could not be cherrypicked due to conflicts
8.18 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 131236

masseyke added a commit to masseyke/elasticsearch that referenced this pull request Jul 21, 2025
…peline processor within a default or final pipeline (elastic#131236)
elasticsearchmachine pushed a commit that referenced this pull request Jul 21, 2025
…peline processor within a default or final pipeline (#131236) (#131639)
joegallo pushed a commit to joegallo/elasticsearch that referenced this pull request Jul 21, 2025
@joegallo joegallo added v8.19.1 and removed v8.19.0 labels Jul 21, 2025
@joegallo
Copy link
Contributor

I've got the 9.0 backport in as #131649, and I've got that one set to (hopefully) automatically backport to 8.19 and 8.18. If it doesn't merge back cleanly, then I'll do more manual backporting. 🤷

elasticsearchmachine pushed a commit that referenced this pull request Jul 21, 2025
…peline processor within a default or final pipeline (#131236) (#131649)

Co-authored-by: Keith Massey <[email protected]>
joegallo added a commit to joegallo/elasticsearch that referenced this pull request Jul 21, 2025
…peline processor within a default or final pipeline (elastic#131236) (elastic#131649)

Co-authored-by: Keith Massey <[email protected]>
szybia added a commit to szybia/elasticsearch that referenced this pull request Jul 22, 2025
…king

* upstream/main: (100 commits)
  Term vector API on stateless search nodes (elastic#129902)
  TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636)
  Add inference.put_custom rest-api-spec (elastic#131660)
  ESQL: Fewer serverless docs in tests (elastic#131651)
  Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132)
  Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656
  [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237)
  Add optimized path for intermediate values aggregator (elastic#131390)
  Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236)
  Refresh potential lost connections at query start for `_search` (elastic#130463)
  Add template_id to patterned-text type (elastic#131401)
  Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531)
  [ES|QL] Add doc for the COMPLETION command (elastic#131010)
  ESQL: Add times to topn status (elastic#131555)
  ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440)
  ES|QL: Improve generative tests for FORK [130015] (elastic#131206)
  Update index mapping update privileges (elastic#130894)
  ESQL: Added Sample operator NamedWritable to plugin (elastic#131541)
  update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419)
  Clarify heap size configuration (elastic#131607)
  ...
szybia added a commit to szybia/elasticsearch that referenced this pull request Jul 22, 2025
…-tracking

* upstream/main: (44 commits)
  Term vector API on stateless search nodes (elastic#129902)
  TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636)
  Add inference.put_custom rest-api-spec (elastic#131660)
  ESQL: Fewer serverless docs in tests (elastic#131651)
  Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132)
  Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656
  [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237)
  Add optimized path for intermediate values aggregator (elastic#131390)
  Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236)
  Refresh potential lost connections at query start for `_search` (elastic#130463)
  Add template_id to patterned-text type (elastic#131401)
  Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531)
  [ES|QL] Add doc for the COMPLETION command (elastic#131010)
  ESQL: Add times to topn status (elastic#131555)
  ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440)
  ES|QL: Improve generative tests for FORK [130015] (elastic#131206)
  Update index mapping update privileges (elastic#130894)
  ESQL: Added Sample operator NamedWritable to plugin (elastic#131541)
  update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419)
  Clarify heap size configuration (elastic#131607)
  ...
joegallo added a commit to joegallo/elasticsearch that referenced this pull request Jul 22, 2025
…peline processor within a default or final pipeline (elastic#131236) (elastic#131649)

Co-authored-by: Keith Massey <[email protected]>
joegallo added a commit to joegallo/elasticsearch that referenced this pull request Jul 22, 2025
…peline processor within a default or final pipeline (elastic#131236) (elastic#131649)

Co-authored-by: Keith Massey <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Jul 22, 2025
…peline processor within a default or final pipeline (#131236) (#131649) (#131654)

Co-authored-by: Keith Massey <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Jul 22, 2025
…peline processor within a default or final pipeline (#131236) (#131649) (#131653)

Co-authored-by: Keith Massey <[email protected]>
@masseyke masseyke deleted the fix/geoip-download_database_on_pipeline_creation branch July 28, 2025 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport pending >bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v8.18.5 v8.19.1 v9.0.5 v9.1.0 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants